Update Iceberg table statistics on inserts#15441
Conversation
a7e45da to
5121372
Compare
There was a problem hiding this comment.
If we move stats collection into Iceberg, we could do this automatically.
There was a problem hiding this comment.
how can we move stats collection to iceberg?
5121372 to
8ea0548
Compare
plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/encoder/json/JsonRowEncoder.java
Outdated
Show resolved
Hide resolved
plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/encoder/raw/RawRowEncoder.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Could you also add information to the documentation ?
There was a problem hiding this comment.
We also need to document iceberg.extended-statistics.enabled before we document iceberg.extended-statistics.collect-on-write. Let's follow-up
Note that Delta's delta.extended-statistics.collect-on-write isn't documented either, so you may want to document it.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsWriter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
This doesn't seem like a safe assumption, unless I'm missing something.
You're walking the snapshot history back until you find all the columns you're looking for so you may find duplicates for the columns you did find already.
There was a problem hiding this comment.
isn't toRead pre-filtered?
There was a problem hiding this comment.
i think it is, so will ignore this for now
There was a problem hiding this comment.
You are right, I missed that filter line
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsWriter.java
Outdated
Show resolved
Hide resolved
`ByteBuffer.array()` is a convenient and efficient way to get bytes from a `ByteBuffer`. It has, however, numerous preconditions that should be checked before using the returned array. The commit replaces all `ByteBuffer.array()` usages where these preconditions are assumed to be true.
- prefer `assertThat(List).hasSize` to `assertThat(list.size())` as the former includes list's elements upon failure, - drop table after test.
…tion Before the change, the test used `getAllMetadataFilesFromTableDirectory` utility, but it was listing all (metadata and data) files. Similar method, `getAllMetadataFilesFromTableDirectoryForTable`, listing only metadata files existed, but was not used in the test. The change merges the utility methods: the correct logic comes from `getAllMetadataFilesFromTableDirectoryForTable` (so no behavior change for tests other than `testCleaningUpWithTableWithSpecifiedLocation`), and the name comes from `getAllMetadataFilesFromTableDirectory`.
Iceberg connector will collect stats during writes, and it would be good to test this together with ANALYZE. Rename class to contain all related functionalities.
All Iceberg connector test classes match `TestIceberg*ConnectorTest` pattern.
f012672 to
a2305fd
Compare
|
rebased to resolve conflicts & rerun flaky(?) cassandra test |
|
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4440329605 |
a2305fd to
6d192be
Compare
|
/test-with-secrets sha=6d192be5d21b16f717b144ae8edc78ac08c113c2 |
|
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4467931974 |
cc @hashhar
|
6d192be to
fb405ea
Compare
|
/test-with-secrets sha=fb405ea87bdffff9e66a8d7717cd86b5a99bc74d |
|
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4469322142 |
bigquery faulures, unreatled
|
| "('mommy', 4)," + | ||
| "('moscow', 5)," + | ||
| "('Kielce', 4)," + | ||
| "('Kiev', 5)," + |
There was a problem hiding this comment.
This is much nicer :) But just wanted to point out that the preferred romanization is "Kyiv". Sadly, there's no timezone "Europe/Kyiv".
No description provided.